Adjusting for a confounding variable when comparing means

Sometimes you are aware the variable you are comparing, such as reduction in blood pressure, is

influenced by not only a treatment approach (such as drug A compared to drug B), but also by other

confounding variables (such as age, whether the patient has diabetes, whether the patient smokes

tobacco, and so on). These confounders are considered nuisance variables because they have a

known impact on the outcome, and may be more prevalent in some groups than others. If a large

proportion of the group on drug A were over age 65, and only a small proportion of those on drug B

were over age 65, older age would have an influence on the outcome that would not be attributable to

the drug. Such a situation would be confounded by age. (See Chapter 20 for a comprehensive review

of confounding.)

When you are comparing means between groups, you are doing a bivariate comparison, meaning you

are only involving two variables: the group variable and the outcome. Adjusting for confounding must

be done through a multivariate analysis using regression.

Comparing means from sets of matched numbers

Often when biostatisticians consider comparing means between two or more groups, they are thinking

of independent samples of data. When dealing with study participants, independent samples means that

the data you are comparing come from different groups of participants who are not connected to each

other statistically or literally. But in some scenarios, your intention is to compare means from matched

data, meaning some sort of pairing exists in the data. Here are some common examples of matched

data:

The values come from the same participants, but at two or more different times, such as before and

after some kind of treatment, intervention, or event.

The values come from a crossover clinical trial, in which the same participant receives two or

more treatments at two or more consecutive phases of the trial.

The values come from two or more different participants who have been paired, or matched, in

some way as part of the study design. For example, in a study of participants who have

Alzheimer’s disease compared to healthy participants, investigators may choose to age-match each

Alzheimer’s patient to a healthy control when they recruit so both groups have the same age

distribution.

Comparing means of matched pairs

If you have paired data, you must use a paired comparison. Paired comparisons are usually handled by

the paired student t test that we describe later in this chapter under “Surveying Student t tests.” If your

data aren’t normally distributed, you can use the nonparametric Wilcoxon Signed-Ranks test instead.

The paired Student t test and the one-group Student t test are actually the same test. When you

run a paired t test, the statistical software first calculates the difference between each pair of

numbers. If comparing a post-treatment value to a pretreatment value, the software would start by

subtracting one value from the other for each participant. Finally, the software would run a test to

see if those mean differences were statistically significantly different from the hypothesized value